Text Classification into Abstract Classes Based on Discourse Structure

نویسندگان

  • Boris A. Galitsky
  • Dmitry I. Ilvovsky
  • Sergei O. Kuznetsov
چکیده

The problem of classifying text with respect to belonging to a document or a meta-document is formulated and its application areas are proposed. An algorithm is proposed for document classification tasks where counts of words is insufficient do differentiate between such abstract classes of text as metalanguage and object-level. We extend the parse tree kernel method from the level of individual sentences towards the level of paragraphs, based on anaphora, rhetoric structure relations and communicative actions linking phrases in different sentences. Tree kernel learning technique is applied to these extended trees to leverage of additional discourse-related information. We evaluate our approach in the domain of action-plan documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Discourse Relations for Sentiment Analysis

The overall sentiment of a text is critically affected by its discourse structure. By splitting a text into text spans with different discourse relations, we automatically train the weights of different relations in accordance with their importance, and then make use of discourse structure knowledge to improve sentiment classification. In this paper, we utilize explicit connectives to predict d...

متن کامل

Classifying Discourse Relations

Classifying Discourse Relations Mridhula Raghupathy & Hena Mehta [email protected] | [email protected] Faculty Advisors: Dr. Aravind Joshi, Dr. Ani Nenkova, & Dr. Alan Lee Abstract The goal of this project was to study properties of discourse relations as they appear in the Penn Discourse Tree Bank (PDTB), a large corpus of naturally occurring text whose discourse relations and their fe...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners

The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...

متن کامل

Going beyond sentences when applying tree kernels

We go beyond the level of individual sentences applying parse tree kernels to paragraphs. We build a set of extended trees for a paragraph of text from the individual parse trees for sentences and learn short texts such as search results and social profile postings to take advantage of additional discourse-related information. Extension is based on coreferences and rhetoric structure relations ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015